Version

Version: 0.1 Date: 3.4.2016

Introduction

FiPi stands for Financial Processing instructions. FiPi is a domain specific language (DSL). It lets you define data sources and pre-processing steps for financial timeseries. For example, you could define a timeseries AAPL, and define it to be the daily close price of the Apple stock, sourced from Bloomberg, adjusted for dividends.

FiPi is lightweight and technology-neutral.

The main benefits of using FiPi are:

Concepts

Timeseries

FiPi is all about financial timeseries. In FiPi, a timeseries is a well-defined entity with the following two properties:

  • it has an ID
  • it has a semantic meaning

For example, AAPL is not the Apple Inc. stock price, but it is, say, the daily close price of the Apple Inc. stock, sourced from Bloomberg, adjusted for dividends.

These two properties (ID and semantic) assure that when you retrieve AAPL within your context, you know exactly what you are getting. It does not mean two subsequent calls will return the same thing, as new data or past corrections may result in differences.

Context

A FiPi Context is a well-defined set of timeseries.

Each FiPi Context is defined in a separate FiPi Definition File.

Within a context, each ID must be unique.

However, it is common that a single financial instrument is represented more than once in a context. For example, you might have a timeseries AAPL for the unadjusted timeseries, and AAPL_a for the adjusted timeseries.

Conversely, it is also very common to have multiple contexts for a single software application. For example, you could have a context for each of these modules:

  • data acquisition: this context defines the timeseries that need to be downloaded from data providers, together with validation and cleaning instructions
  • trading: this context defines, for example, actual futures contracts
  • research: this context could define rolled futures series, together with the instructions to stitch them together from individual futures contracts

Instructions

Often, financial data needs to be pre-processed before it can be fed into an algorithm, used for a visualization, or saved into a database.

For example, data needs to be validated and cleaning needs to be done. In FiPi, these processing steps are called instructions. You can think of each instruction as a function, where the input is one or more timeseries, and the output is the transformed timeseries.

Examples

Some examples of instructions are:

  • data acquisition:
    • download prices from third party data providers
    • download prices from the internet
    • download prices from a propietary in-house data-source
    • cache prices in memory with a predefined timeout
  • data validation:
    • test for missing data points
    • check age of last available data point
  • data cleaning:
    • remove outliers
    • backfill missing datapoints with the last available value
    • enhance a series by filling missing data with data from an alternative data source
  • data transformation:
    • derive returns from prices
    • index prices to start at 100
    • normalize a risk index to take values between -1 and +1
    • convert frequency of data, e.g. from hourly bars to daily
    • regularize the time index, e.g. by setting it always to end-of-month for economic time series
  • data aggregation:
    • adjust price series for dividends
    • fill history with a proxy series
    • combine multiple price series into a custom index
    • stitch multiple futures contracts into a generically rolled series, using your custom algorithm

Instruction Semantics

FiPi instructions are free of a semantic meaning. In other words, FiPi does not have a predefined set of instructions from which you can chose. On the contrary, the FiPi interpreters define how they map an instruction to an actual function. (See more about FiPi interpreters below).

This can be seen as both good and bad:

  • It is bad for portability, as your definition/implementation of, say, Backfill might not correspond to someone else’s
  • It is good for flexibility, as you are free to implement any instruction you can come up with
  • It makes FiPi very lightweight, yet powerful, as you can use any pre-existing functionality and map it to an instruction

Why did we chose this aproach? As much as we value portability, we attributed less weight to it for FiPi. The reason is that flexibility is king. There are literally thousands of price sources, cleaning routines, etc. Organizations have invested large amounts to create libraries that perform these routines, and piggy-backing on them is of highest priority for FiPi.

Also, if portability is needed, there are sound strategies to ensure it. For example, you can write a library with implementations of your instructions, and distribute it together with your FiPi files.

Processing Tree

Multiple instructions are combined into a processing tree. There is exactly one processing tree per timeseries.

Technically, a processing tree consists of a number of nodes, where each node is an instruction.

The reason for having a processing tree (as opposed to, say, a processing pipe) is that, often, we want to aggregate multiple timeseries into a single timeseries. And here, a tree structure comes in very handy.

The processing order of a processing tree is from leafs to root.

For example, each leaf of a processing tree could define a data source, and nodes further down the tree define validation, transformation and aggregation steps. The root of the tree then represents the result of the processing tree, which typically is a single timeseries.

More specifically, a stylized example processing tree could be structured like this:

FiPi Definition File

A FiPi Context is defined in a YAML file.

FiPi Interpreters

A FiPi interpreter is a piece of software that reads a FiPi Definition File and provides an API to call a processing tree in order to load and preprocess financial data.

For example, in pseudo code, this might look like this:

fipi = LoadContext("C:/temp/myfipi.yaml")
aapl = LoadData("AAPLadj", fipi)
Plot(appl)

Reference Implementation

The reference implementation of a FiPi Interpreter is programmed in the R programming language, and can be obtained from Github. We hope to add FiPi interpreters in other languages in the near future. Please do get in touch with us if you are interested in collaborating with us. As FiPi is very lightweight, implementation of an interpreter is very straightforward and simple.